Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insert plaintext protobuf into Scylla #247

Merged
merged 7 commits into from
Dec 21, 2024
Merged

Conversation

jr1221
Copy link
Contributor

@jr1221 jr1221 commented Nov 23, 2024

Changes

Inserts plaintext protobuf into scylla. Steps:

  • File is created and stored by Odysseus-Daemon with the length delimited protobuf PlaybackData
  • File(s) are HTTP multipart POST to the server
  • The server creates a cache of run IDs indexed as a HashMap where the key is a range of times and value is the associated run.
  • The server parses each file to a list of protobuf points
  • Scylla then fills these points to ClientData objects, filling runID or setting -1 if out of range.
  • Next the batch insert happens

Changes outside of new insert:

  • Made data primary key (dataTypeName, runId)
  • Made Text NON NULL to remove the Nullable requirement
  • Fix issue with batch inserting 0 values

Notes

Any other notes go here

Test Cases

  • Postman the log files from Odysseus-Daemon, observe they make it into DB
  • Amount of data inserted =~ amount of data inserted on upload in same span
  • Data is not duplicated, amount inserted on 2nd time is 0
  • Amount insert on first time is 0 when scylla online
  • Data written to disk before scylla creates run make work #1 is considered invalid.

To Do

  • Figure out batching (may push to [Scylla] - Fix batching logic #244 )
  • Figure out what limits we are gonna hit (other than the above)
  • Investigate performance, do a proper test

Checklist

It can be helpful to check the Checks and Files changed tabs.
Please review the contributor guide and reach out to your Tech Lead if anything is unclear.
Please request reviewers and ping on slack only after you've gone through this whole checklist.

  • All commits are tagged with the ticket number
  • No linting errors / newline at end of file warnings
  • All code follows repository-configured prettier formatting
  • No merge conflicts
  • All checks passing
  • Screenshots of UI changes (see Screenshots section)
  • Remove any non-applicable sections of this template
  • Assign the PR to yourself
  • No package-lock.json changes (unless dependencies have changed)
  • Request reviewers & ping on Slack
  • PR is linked to the ticket (fill in the closes line below)

Closes #243

@jr1221 jr1221 self-assigned this Nov 23, 2024
@jr1221 jr1221 force-pushed the 243-plaintext-inserter branch from f0811ff to 53f2559 Compare December 20, 2024 03:16
@jr1221 jr1221 marked this pull request as ready for review December 20, 2024 04:16
@jr1221 jr1221 requested a review from bracyw December 20, 2024 04:59
@jr1221
Copy link
Contributor Author

jr1221 commented Dec 20, 2024

@jr1221
Copy link
Contributor Author

jr1221 commented Dec 20, 2024

Issues remaing:
There seems to be a memory leak in the batch uploading DB layer, causing massive memory consumption when this is run. Seems to be a stack loop or task related. spawn_blocking does not mitigate it. Plan to fix in #244

@bracyw
Copy link
Contributor

bracyw commented Dec 21, 2024

Issues remaing: There seems to be a memory leak in the batch uploading DB layer, causing massive memory consumption when this is run. Seems to be a stack loop or task related. spawn_blocking does not mitigate it. Plan to fix in #244

Do you want to just keep this open until 244 is finished? If it's broken rn probably no point to merge to develop.

@jr1221
Copy link
Contributor Author

jr1221 commented Dec 21, 2024

It works it just uses a bunch of memory (4gb/hour of data collected). 244 may be a pretty significant shift as well.

Copy link
Contributor

@bracyw bracyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM lmk if you think changing those error msgs would be at all useful

scylla-server/src/db_handler.rs Show resolved Hide resolved
scylla-server/Cargo.toml Show resolved Hide resolved
@jr1221 jr1221 merged commit 2517000 into develop Dec 21, 2024
4 checks passed
@jr1221 jr1221 deleted the 243-plaintext-inserter branch December 21, 2024 04:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Scylla] - Investigate a new CSV handler
2 participants